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Abstract 

Consider the nonparametric regression model Y = m(X) + e, where the function m is smooth but 
unknown, and e is independent of X. An estimator of the density of the error term e is proposed and its 
weak consistency is obtained. The contribution of this paper is twofold. First, we evaluate the impact of 
the estimation of the regression function on the error density estimator. Secondly, the optimal choices of 
the first and second step bandwidths used for estimating the regression function and the error density are 
proposed. Further, we investigate the asymptotic normality of the error density estimator and evaluate 
its performances in simulated examples. 

Keywords: Two-step estimator, First-step bandwidth, second-step bandwidth. 

1 Introduction 

Let (Xi,Y\), . . . , (X n , Y n ) be a sample of independent replicates of the random vector (X, Y), where Y is the 
univariate dependent variable and X is the covariate of dimension d. Let m(-) be the conditional expectation 
of Y given X and let e be the related regression error term, so that the regression error model is 

Y = m(X)+e, (1.1) 

where e is assumed to have mean zero and to be statistically independent of X, and the function m(-) 
is smooth but unknown. In this paper, we investigate the problem of nonparametric estimation of the 
probability density function (p.d.f) of the error term e. The difficulty of this study is the fact that the 
regression error term is not observed and must be estimated. In such setting, it would be unwise to estimate 
the error density by means of the conditional approach which is based on the probability distribution function 
of the response variable given the covariate. Indeed, this approach is affected by the curse of dimensionality, 
so that the resulting estimator of the residual term would have considerably a slow rate of convergence if 
the dimension of the explanatory variable is very high. The strategy used here is based on the estimated 
residuals, which are built from the nonparametric estimator of the regression function m(-). The proposed 
estimator for the density of e is built by using the estimated residuals as if they were the true errors, 
and the weak consistency of this estimator is obtained. Our results may have many possible applications. 
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First, the estimator of the density /(•) of the residual term e is an important tool for understanding the 
residuals behavior and therefore the fit of the regression model Indeed, this estimator can be used 

for goodness-of-fit tests of a specified error distribution in a parametric or nonparamctric regression setting. 
Some examples can be found in Loynes (1980), Akritas and Van Keilegom (2001), Cheng and Sun (2008). 
Secondly, the estimation of /(•) can be useful for testing the symmetry of the residuals distribution. See 
Ahmad and Li (1997), Dette, Kusi-Appiah and Neumeyer (2002), Neumeyer and Dette (2007) and references 
therein. Note also that the estimation of the error density is useful for forecasting Y by means of a mode 
approach, since the mode of the p.d.f of Y given X is m(x) + argmax eg K /(e). Another interest in estimating 
/(•) is the construction of nonparamctric estimators of the hazard function of Y given X (sec Van Keilegom 
and Veraverbeke, 2002), or the estimation of the density of the response variable Y (see Escanciano and 
Jacho-Chavez, 2010). 

Many estimators of the p.d.f. of the regression error e can be obtained from estimation of the regression 
function and the conditional p.d.f of Y given X. For the estimation of the latter, see Roussas (1967, 1991) 
and Youndje (1996), among others. More direct approaches have also been proposed. Akritas and Van 
Keilegom (2001) estimate the cumulative distribution function of the regression error in heteroscedastic 
model with univariate covariates. The estimator they propose is based on a nonparametric estimation of 
the residuals. Their results show the impact of the estimation of the residuals on the limit distribution of 
the underlying estimator of the cumulative distribution function. The results obtained by Akritas and Van 
Keilegom (2001) are generalized by Neumeyer and Van Keilegom (2010) in the case of the same model with 
multivariate covariates. Miiller, Schick and Wcfelmcyer (2004) consider the estimation of moments of the 
regression error. Quite surprisingly, under appropriate conditions, the estimator based on the true errors 
is less efficient than the estimator which uses the nonparametric estimated residuals. The reason is that 
the latter estimator better uses the fact that the regression error e has mean zero. Fu and Yang (2008) 
study the asymptotic normality of kernel error density estimators in parametric nonlinear autorcgrcssivc 
models. They show that at a fixed point, the distribution of these error density estimators is normal without 
knowing the nonlinear autoregressive function. Wang, Brown, Cai and Levine (2008) investigate the impact 
of the estimation of the regression function on the estimator of the variance function in a heteroscedastic 
model. In their study, they show that for a good estimation of the variance function, it is important to use 
a very small bandwidth, and so a weakly biased estimator for the regression function of their model. Cheng 
(2005) establishes the asymptotic normality of an estimator of /(•) based on the estimated residuals. This 
estimator is constructed by splitting the sample into two parts: the first part is used for the construction 
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of the estimator of /(■), while the second part of the sample is used for the estimation of the residuals. 
Efromovich (2005) proposes adaptive estimator of the error density, based on a density estimator proposed 
by Pinsker (1980). Although these authors used the estimated residuals for constructing an estimator of 
the error density, none of them investigated the impact of the dimension of the covariate on the estimation 
of /(•), nor the influence of the first-step bandwidth used to estimate m(-), on the estimator of the error 
density. 

The contribution of this paper is twofold. First, we evaluate the impact of the estimation of the 
regression function on the error density estimator. Second, the optimal choices of the first and second step 
bandwidths used for estimating the regression function and the residual density respectively, are proposed. To 
this end, the difference between the feasible estimator which uses the estimated residuals, and the unfeasible 
one based on the true errors is established. Further, we investigate the asymptotic normality of the feasible 
estimator and evaluate its performance through a simulation study. 

The rest of this paper is organized as follows. Section 2 presents our estimators and some notations 
used in the sequel. Sections 3 and 4 group our assumptions and main results respectively. Section 5 is 
devoted to the simulations. Some concluding remarks are given in Section 6, while the proofs of our results 
are gathered in Section 7 and in an appendix. 



2 Construction of the estimators and notations 

The approach proposed here for the nonparametric kernel estimation of /(e) is based on a two-steps proce- 
dure, which builds, in a first step, the estimated residuals 

£i = Yi-m in , i = l,...,n, (2.1) 

where fhi n = m.j n (X,) is the leave-one out version of the Nadaraya- Watson (1964) kernel estimator of m(Xi), 



Xj-Xi 



(2.2) 



3& 

Here Kq(-) is a kernel function defined on M. d and bo = bo(n) is a bandwidth sequence. It is tempting to 
use, in the second step, the estimated £j as if they were the true residuals £». This would ignore the fact 
that the fhi n (Xi)'s can result in severely biased estimates of the m(Xi)'s for those X, which are close to the 
boundaries of the support X of the covariate distribution. That is why our proposed estimator trims the 
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observations Xi outside an inner subset Xq of X, 

fJe) = —^n \ 7 Vlfl.e Xo) Ki ( ^-^] , (2.3) 

where i^i(-) is a univariate kernel function and b± — b\{n) is a bandwidth sequence. This estimator is the 
so-called two-steps kernel estimator of /(e). In principle, it would be possible to assume that most of the 
Xi's fall in Xo when this set is very close to X. This would give an estimator close to the more natural 
kernel estimator E"=i K ((^ — e )/bi) /(nbi). However, in the rest of the paper, a fixed subset Xq will be 
considered for the sake of simplicity. 

Observe that the two-steps kernel estimator f n (e) is a feasible estimator in the sense that it does not 
depend on any unknown quantity, as desirable in practice. This contrasts with the unfeasible ideal kernel 
estimator 

~ 1 " / — \ 

b\ Ei=i 1 ( X i ^ X o)~[ \ bi J 

which depends in particular on the unknown regression error terms. It is however intuitively clear that a 
proportion of the estimated residuals (those with Xi not close to the boundary of X) yield a density estimator 
rivaling the one based on the corresponding proportion of the true errors. 

In the sequel we will denote by <pW the k th derivative of any function tp which is k times differcntiable. 

3 Assumptions 

The assumptions we need for the proofs of the main results are listed below for convenient reference. 

(Ai) The support X of X is a subset ofM. d , Xo has a nonempty interior and the closure of Xo is in the 
interior of X. 

(A2) The p.d.f. g(-) of the i.i.d. covariates Xi is strictly positive over Xq and has continuous second order 
partial derivatives over X . 

(A3) The regression function m(-) has continuous second order partial derivatives over X. 

(A4) The i.i.d. centered error regression terms Si's have finite 6th moments and are independent of the 
covariates Xi 's. 

(A5) The probability density function /(•) of the Ei 's has bounded continuous second order derivatives over 
R and satisfies sup egR \hp (e)| < 00, where h p (e) = e p f(e), p G [0, 2] and k € {0, 1, 2}. 
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(Aa) The kernel function Kq is symmetric, continuous over K with support contained in [—1/2, l/2] d and 
satisfies jK (z)dz = 1. 

(A7) The kernel function K\ is symmetric, has a compact support, is three times continuously differentiable 
over R, and satisfies J Ki{v)dv = 1, J k[ £) (v)dv = for £ = 1,2, 3, and J vK[ l \v)dv = for I = 2, 3. 

(Ag) The bandwidth bo decreases to when n — > 00 and satisfies, for d* — sup{d + 2, 2d}, nb$ /Inn — > 00 
and ln(l/&o)/ln(lnn) — > 00 when n — > 00. 

(Ag) The bandwidth b\ decreases to and satisfies n^ d+s, ^b 7 ^ d+4 ^ — > 00 when n — > 00. 

Assumptions (A2), (A3) and (A5) impose that all the functions to be estimated nonparametrically have two 
bounded derivatives. Consequently the conditions J zKo(z)dz = and JvKi(v)dv — 0, as assumed in (Ag) 
and (A7), represent standard conditions ensuring that the bias of the resulting nonparametric estimators 
(|2.2[) and (|2.4[) arc of order &q and b\. Assumption (A4) states independence between the regression error 
terms and the covariates, and the existence of the moments of e up to the sixth order. The interest of this 
assumption is to make easier techniques of proofs for the asymptotic expansion of the estimator /„ (e) . The 
differentiability of K\ imposed in Assumption (A7) is more specific to our two-steps estimation method. This 
assumption is used to expand the two-steps kernel estimator f n (e) in (12.31) around the unfeasible one / n (e) 
from (|2.4p . using the errors estimation Si — e, and the derivatives of K\ up to the third order. Assumption 
(As) is useful for obtaining the uniform convergence of the Nadaraya- Watson estimator of m (see for instance 
Einmahl and Mason, 2005), and also gives a similar consistency result for the leave-one-out estimator fhi n 
in (|2.2[) . Assumption (Ag) is needed in the study of the difference between the feasible estimator f n (e) and 
the unfeasible estimator / n (e). 

4 Main results 

This section is devoted to our main results. The first result we give here concerns the pointwise consistency 
of the nonparametric kernel estimator /„ of the error density /. Next, the optimal first-step and second-step 
bandwidths used to estimate / are proposed. We will finish this section by establishing the asymptotic 
normality of the estimator /„. 
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4.1 Pointwise weak consistency 



The following result gives the order of the difference between the feasible estimator and the theoretical error 
density for all eel. 

Theorem 4.1. Under (Ai) — (Ag) ; we have, for all eel, and bo and b\ going to 0, 

1/2 



whe 



f n (e) - /(e) = OrlAMSEih) + RniboMl 



AMSE(b\) = E r , 



(7n(e)-/(e) 



Op (bi 



nbi 



R n (b ,b 1 )=b$ + 



(nb\) 1 / 2 \b\ 



1/2' 



nbfi 



b d\ 1/2 



i 2 



The result of Theorem 14.11 is based on the evaluation of the difference between / n (e) and / n (e). This 
evaluation gives an indication about the impact of the estimation of the residuals on the nonparametric 
estimation of the regression error density. The remainder term R n (bo, bi) comes from the replacement of the 
unknown m(Xi) in e, by the estimate rhi n (Xi). 



4.2 Optimal first-step and second-step bandwidths for the pointwise weak con- 
sistency 

As shown in the next result, Theorem 14.21 gives some guidelines for the choice of the optimal bandwidth bo 
used in the nonparametric estimation of the regression errors. As far as we know, the optimal choice for 
bo has not been investigated before in the nonparametric literature. In what follows, a n >c b n means that 
»n = 0(b n ) and b n = 0(a n ), i.e. that there is a constant C > such that \a n \/C < \b n \ < C\a n \ for n large 
enough. 

Theorem 4.2. Assume (Ai) — (Ag) and define 

bo = K(bi) = argmini? n (& ,&i), 

bo 

where the minimization is performed over bandwidth bo fulfilling (As). Then, 

i i ■ 

Y \ d + 4 / 1 \ 2ci + 4 

bn 



„2 6 3 



and 

Rn{bo,h) - max 



2 b\) \n%\ 



Our next theorem gives the conditions for which the estimator / n (e) reaches the optimal rate n~ 2 / 5 when 
b takes the value 6q. We prove that for d < 2, the bandwidth that minimizes the term AM SE(bi)+R n (bQ, bi) 
has the same order as n -1 / 5 , yielding the optimal order n™ 2 / 5 for (AMSE(bi) + R n (b^, bi)) 1 ^ 2 . Note that 
the order n~ 2 ^ 5 is the optimal rate achieved by the optimal kernel estimator of an univariate density. See, 
for instance, Bosq and Lecoutre (1987), Scott (1992) or Wand and Jones (1995). 

Theorem 4.3. Assume (Ai) — (Ag) and let 

b\ = SiYgmm^AMSEih) + R^b^h^j , 

where &q = frg(6i) is defined as in Theorem \4-S\ Then, 

i. For d < 2, we have 

and 

]AMSE(b\) + Rn(bQ,bl)j ^ x fi 



ii. For d > 3, we have 

3 

\ \ 2d+l 



and 



(AMSE(bl) + R n (b*,bl) 



\ \ 2d+l 



The results of Theorem 14.31 show that the rate n _2//5 is reachable if and only if d < 2. These results are 
derived from Theorem 14.21 This latter implies that if b\ x n -1 / 5 , then 6q has the same order as 




For d < 2, this order of 6q is smaller than the one of the optimal bandwidth bo obtained for the nonparametric 
kernel estimation of m(-). Indeed, it has been shown in Nadaraya (1989, Chapter 4) that the optimal 
bandwidth bo needed for the kernel estimation m(-) satisfies bo X n _1 /( d+4 '. For d = 1, the order of 6q is 
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n -(!/5)x(4/3) -^h^h g 0es ^ o slightly faster than n -1 / 5 , the optimal order of the bandwidth bo- For d = 2, 
the order of 6q is n" 1 / 5 . Again this order goes to faster than the order n™ 1 / 6 of the optimal bandwidth for 
the nonparametric kernel estimation of the regression function with two covariates. This suggests that for 
d = 1 and d = 2, the ideal bandwidth bo needed to estimate the residual terms should be very small. Such 
finding parallels Wang, Brown, Cai and Levine (2008) who show that a similar result hold when estimating 
the conditional variance of a heteroscedastic regression error term. However Wang et al. (2008) do not give 
the order of the optimal bandwidth to be used for estimating the regression function in their heteroscedastic 
setup. 

For d > 3, we do not achieve the convergence rate n~ 2 / 5 for our proposed estimator / n (e). However, we note 
that bo goes to slower than 6q. This shows that the convergence rate obtained for f n (e) is better than the 
optimal rate achieved in the case of a classical kernel estimator of a multivariate density. 

All these results prove that the best estimator m„ of m needed for estimating / should use a very small 
bandwidth bo- This suggests that m„ should be less biased and should have a higher variance than the 
optimal nonparametric kernel estimation of m. Consequently the estimators of m with smaller bias should 
be preferred in our framework, compared to the case where the regression function m is the parameter of 
interest. Indeed, in our case, as in Wang ct al. (2008), the square of the bias is of more important than the 
variance. 

4.3 Asymptotic normality 

Our last result concerns the asymptotic normality of the estimator f n (e). 
Theorem 4.4. Assume (Ai) — (Ag) and 

(Aio): nb d +4 = O(l), nb^h = o(l), nbfof -> oo, 

when n goes to oo. Then, 

(/» - 7„(e)) -4 N (0, ¥ 0^- ) J Kl(v)dv) , 

where 

7 n (e) = f(e) + |/( 2 )(e) J v^K^dv + o (b\) . 
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Note that for d < 2, b\ — b\ and b = 6g, Theorems 14.21 and 14.31 imply that 

bi 

which yields 

















a) 











... / 1 \ 5(2d+4) / 2 \ 5(2d+4) / 1 \ 5(2d+4) 

This shows that for d = 1, Assumption (Aio) is realizable with the bandwidths b^ and 6*. But with these 
bandwidths, the last constraint of (Aio) is not satisfied for d = 2, since nb^bf is bounded as n — >• oo. 

5 Simulations 

In this section we report simulation results evaluating the finite sample behavior of the estimators /„ and /„. 
In two examples, we evaluate the performance of these estimators in terms of asymptotic biases, variances and 
mean square errors. The first example concerns a one-dimensional regression model (univariate covariate), 
while the second example is devoted to a regression model with a three-dimensional covariate. 

5.1 Univariate case 

We work with the following data generating model 

Y = 1 + sin(TrA) + e, (5.1) 

where e - A(0, 1) and X - U[0, 1]. We use the kernel K = Kj(x) = (15/16)(1 - x 2 ) 2 t(\x\ < 1) (j = 0, 1). 
Our results are based on 300 simulation runs. For the bandwidth choice, we consider the results of Theorems 
O and [O and take 



bi = i>i, 6q = cq x max 




I \ d + i / 1 \ 2d + 4 / 1 




where d = 1, Co is a given constant in [0,1] and b\ — 1.06 x a e x n™ 1 / 5 is the Silverman's (1986) rule of 
thumb bandwidth for the estimator /„. Here <? e is the average standard deviation of the generated errors. 
For the estimators f n and /„, we consider X = [5, 1 — S\, 5 = 0.001. 

TABLE 1 HERE 

In Table[T]we give some values of the bias, variance and mean square error of f n (e) at the points e = — 1, 
and 1 for different sample sizes. For each sample, the values are calculated for cq = 0.25, 0.5 and 1. From 
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Table [TJ we see that our method seems to work well, since the variance and mean square error of / n (e) are 
very close to 0. We also observe that the performance of /„(e) should not be very sensitive to the choice 
of the constant c , since the variations of the variance and the mean square error are practically negligible. 
Further, we note that for e = —1,1 and n = 100 the variance and the mean square error of f n (e) are smaller 
than the ones of / n (e). This fact parallels the surprising situation noticed in Miiller, Schick and Wcfclmeyer 
(2004) for the nonparamctric kernel estimation of moments of the regression error. 

FIGURE 1 HERE 

Figure [T] compares the curves of /„ and f n for cq = 1 and for samples size n = 50 and n = 100. We 
observe almost no difference between the performances of these two estimators. This should suggest that 
the estimators /„ and /„ are asymptotically equivalent when n4oo. 

5.2 Trivariate case 

We consider the model 

Y = 1 + Xi + X% + sin{irX 3 ) + e, (5.2) 

where e ~ N(0, 1) and X\, X2, X3 ~ U[0, 1]. As in the univariate case, our study is based on 300 simulation 
runs. We use the kernels K±(x) = (15/16)(1 — x 2 ) 2 l(|x| < 1), ^0(^1,^2,^3) = Y[j=i Ki( x j) an d consider 
X = [S, 1 - S} 3 , 5 = 0.001. We use the bandwidths 

bi=h, b = co x max j (J^j ^ , (-^ ^ | = c (^3) ? , 

where d — 3, cq € [0, 1] and b\ is the average standard deviation on the generated errors. 
FIGURE 2 HERE 

Figure I5T21 compares the curves of /„(e) and / n (e) for cq = 1 and sample sizes n — 100 and n = 200. 
We note a difference between the curves at the neighborhood of the inflexion point e = 0. But this difference 
is less important for n = 200. This augurs that for e very close to 0, the difference between /„(e) and / n (e) 
should be negligible only when the size of the samples is large enough. 

TABLE 2 HERE 

In Table [5] we give some values of the bias, variance and mean square error of f n (e) and / n (e) for 
cq = 0.25, 0.5 and 1. We see that the mean square error of f n (e) is greater than the one of / n (e). Further, 
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we observe that the performance of / n (e) should be sensitive to the choice of the constant cq. For example, 
for e — 0, cq — 0.5 and Cq = 1, the mean square error of f n (e) is very high compared to the sum of the 
variance and the square of the bias. 

6 Conclusion 

The aim of this paper was to investigate the nonparametric kernel estimation of the probability density 
function of the regression error using the estimated residuals. First, we evaluated the impact of the esti- 
mation of the regression function on the error density estimator. To this aim, the difference between the 
feasible density estimator based on the estimated residuals and the unfeasible one using the true errors was 
investigated. Second, the optimal choices of the first and second step bandwidths used for estimating the 
regression function and the error density were proposed. Further, we establish the asymptotic normality 
of the feasible estimator. The strategy used here to estimate the error density is based on a two-steps 
procedure which, in a first step, replaces the unobserved residuals terms by some nonparametric estimators 
e £ = Yi — m n (Xi), where m n {Xi) is a nonparametric estimator of m{Xi). In a second step, the estimated 
residuals £j are used to estimate the error density /(•), as if they were the true £j's. Though proceeding may 
remedy the curse of dimensionality for large sample sizes, a challenging issue was to evaluate the impact of 
the estimated residuals on the estimation of /(•), and to find the order of the optimal first-step bandwidth 
bo used for estimating the error terms. For the choice of bo, our results show that the ideal bandwidth for bo 
should be smaller than the optimal bandwidth for the nonparametric kernel estimation of m(-). This means 
that the best estimator of m(-) needed for estimating /(•) should have a lower bias and a higher variance 
than the classical kernel regression estimator. With this ideal choice of bo, we establish that for d < 2, the 
estimator f n (e) of /(e) can attain the convergence rate n~ 2 ' 5 , which corresponds to the optimal consistency 
rate achieved by the univariate kernel density estimator. For d > 3, the rate n~ 2 ' 5 is not reachable by our 
estimator / n (e). However, the rate we obtain for / n (e) is better than the optimal one achieved in the case 
of the kernel estimation of a multivariate density. 
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7 Proofs section 



Intermediate Lemmas for Theorem 14.11 
Lemma 7.1. Define, for x £ Xq, 

' Xj-x 
bo 



9^ = idJ2 K ° 



u 1=1 



g n (x) =E[g n (x)] 



Then under (Ai) — (A2), (Ae) and (As), we have, when bo goes to 0, 



sup \g n (x) - g{x)\ = O (&§) , sup \g n (x) - g n (x)\ = O v [b% + — - s 
xex xex \ n>o 



1/2 



and 



sup 



g n (x) g{x) 



A Inn 



1/2 



tiX^Xo) fe.-e 



Lemma 7.2. Set 

(rA = 

bi¥(X£X ) L \ 61 

Then under (A4), (A5) and (A7), we have, for b\ going to 0, and for some constant C > 0, 

E/i„(e) = /(e) + |/ (2) (e) / tfK^dv + o (bj) , 



Var(/ m (e)) 



/(e) 



6iP(jr g x Q ) 



[ — ) 



E\f in (e)-Ef in (e)\ 3 < 



blF>(XeX )J MWto + o^ 



Ei — e 



Lemma 7.3. Define 

n 

S n = y 1 (A, e *„) (m in - m(Xi)) Kf 5 

n / 

r„ = ^l(A i eA' )(m m -m(A t )) 2 Af ) 



i?„ = £l(X, G * )(m, -tfK^ ( ei-t<frn-m{Xi))S 



i=l 
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Then under (Ai) — (Ag), we have, for bo and b\ small enough, 



S n = Op 

T n = Op 

Rn = Op 



% (n&? + (n&i) 1 / 2 ) + (nbi + 



,4 , h 



1/2 



3/2' 



[nbl + (nb 1 )^ + (nXbl) 1/2 )(bt ) + ^ 
(nbf + (nXh) 1/2 ) fa + i 



Lemma 7.4. Under (A5) and (A7) we have, for some constant C > 0, and for any e in M and p G [0, 2], 

2 

' < Cbf, (7.1) 



61 



R (2) I £^ , eP/(e)(ie 



<CT>i, 
<C6i, 
<CT>i, 



( £^ ) ePf{e)de 



K 



bi 



e p f(e)de 



?f(e)dt 



<Cbl 
< Cbf. 



(7.2) 
(7.3) 



Lemma 7.5. Let 



where 



Xj — X{ 
bo 



gin 



1 ™ 



nb o j~( V 60 

iiti 



Xj - Xi 



Then, under (Ai) — (Ag), we have, when bo and b\ go to 

) (ej-e 



bi 



Op(b 2 ) (nbl + inh) 1 ^ 2 



Lemma 7.6. Let 



I (Xi £ Xp) ^ 



A, - X t 



Then, under (Ai) — (Ag), we have 

n 



1/2 
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Lemma 7.7. Let E„[-] be the conditional mean given X\, . . . ,X n . Then under (Ai) — (Ag), we have 

i 2 

\4 



sup E n 

Ki<n 



sup J 

Ki<n 



1 {Xi G Ab) (m in - m(X l )) 
1 (A, G Xo) (m m - to(AO) 6 



1 



Op ^ 



1 



Lemma 7.8. Assume that (A4) and (Aq) hold. Then, for any 1 < i 7^ j < n, and for any e in K, 

(fhi n - m{Xi),£i) and (rh jn - m(Xj),Ej) 
are independent given X\, . . . ,X n , provided that ||A, — Xj\\ > Cbo, for some constant C > 0. 

Lemma 7.9. Let Var„(-) and Cov n (-) be respectively the conditional variance and the conditional covariance 
given X\, . . . , X n) and set 

Cm = 1 (Xi G Af ) (ffWn - m{Xi)fK[ 2) (^-^ 
Then under (Ax) — (Ag), we have, for n going to infinity, 



n , 1 

E Var ™ (Cm) = Op (nfti) (6g + — 



n n / 1 

£ E Cov » (C™ , On ) = Or (nXb\ /2 ) K + ~TS 



i=i 3=1 



All these lemmas are proved in Appendix A. 



Proof of Theorem 14.11 



The proof of the theorem is based on the following equalities: 



fn(e) - /„(e) 



Ch 



n n 2 bf i b\ 



1/2- 



Op 



+ o P 



b dS 1/2' 



° nfif? 



and 



fix V&I 



/„(e) - /(e) = O r 6f + 



(nfif) 1 /2 
3/2 



nfii 



1/2 



d\ 1/2- 



&1 



(7.4) 



(7.5) 



14 



Indeed, since f n (e) - /(e) = (/„(e) - /(e)) + /„(e) - f n (e), it then follows by dTTSJ) and (UU) that 



Me) - /(e) 



0, 



+ ^ + + i + 



1 



°0 



Op 



n n 2 b d b\ y(nbly/ 2 \bl 



6n + 



1/2 



'0 



1 



This yields the result of the Theorem, since under (As) and (Ag), we have 



I = o 

n 



1 



1 



n 2 b$b\ 

Hence, it remains to prove equalities (|7.4[) and ()7.5[) . For this, define S n , Rn and T„ as in the statement of 
Lemma 17.31 Since e~i — Si — — (fhi n — m(Xi)) and that K\ is three times continuously differentiable under 
(A7), the third-order Taylor expansion with integral remainder gives 



/in(e) - fn(e) 



1 " 



bi 



bi 



Rn 



Therefore, since 



&i£2=il(*<e# ) \bi 2b\ 2b\ 



l (A, e * ) = n (P (X e *„) + op(i)) . 



by the Law of large numbers, Lemma 17.31 then gives 



1 

nbj 



nb\ 



Oj 



bl I 1 



(nbf) 1 / 2 J \n n 2 bfol 



O r 



{nbl) 1 ' 2 \b\ 



d \ 1/2' 



1/2' 



1 



bi , 

° "fig 



P 



1 

h 



6 A V2' 



b4 ° + ^l 



3/2 



This yields (|7.4p . since under (Ag) and (Ag), we have bo — > 0, nb +2 — ► 00 and n&f — > 00, so that 



bl 1 



(n6?)Va 



0(62) 



MV /2 



For ([7T5) . note that 



1 



d\ I/ 2 ' 



(/n(e) - /(e)) 2 = Var„ (/„(e)) + U n [/„(e)] - /(e)] , 



(7.6) 
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with, using (A4), 

Var„ ( /„(e) 



(&ie: =1 i(^6ws 

Therefore, since the Cauchy-Schwarz inequality gives 



Y^l (Xi £ Xq) Var 



e — e 



Var 



A, 



e — e 



< 



Kl 



e — e 



= 61 / Kt(v)f{e + hv)dv, 



this bound and the equality above yield, under (A5) and (A7), 
For the second term in (|7.6[) . we have 



E„ 



/n(e) 



1 



£ — e 



(7.7) 



(7.8) 



By (A7), Ai is symmetric, has a compact support, with JvKi(v) — and jKi(v)dv = 1. Therefore, since 
/(■) has bounded continuous second order derivatives under (A5), this yields for some 9 = 6(e, b\v), 



E 



£ — e 



Hence this equality and (|7.8|l give 

E 

so that 



= 61 / Ai(v)/(e + feiw)dw 
61 /(e) + 6 lU /W(e) + ^/( 2 )(e + ^ 1 «) 

i,3 

^i/(e) + y / v 2 K 1 (v)f {2) (e + 9b 1 v)dv. 



/„(e)] =/(e) + ^ I v 2 K 1 (v)f^(e + 6b 1 v)dv, 



E„ 



(7.9) 



/n(e) ~/(e) =Op(6f) 



Combining this result with (|7.7|) and (|7.6|) . we obtain, by the Tchebychev inequality, 



/n(e) - /(e) = Op 6f + 



1/2 



This proves (|7.5|l and achieves the proof of the theorem. 
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Proof of Theorem 14.21 

Recall that 



and note that 



1 



(nftf)Va V&i 



6g + 



1 



d\ l/ 2 ' 



6? 



6S + 



= max ■ 



n 2 b\) I \n 2 b\J ' \Ji 3 £>][ y 

if and only if n 4_d &^ +16 — >• oo. To find the order of 6q, we shall deal with the cases nb c ^ +A —> oo and 
nb d +i = 0(1). 

First assume that nb^ +i — > oo. More precisely, we suppose that bo is in [(w n /n) 1 /^ +4 \ +oo), where u n — > oo. 
Since l/(n6g) = O(&o) f° r an these &o, we have 



nb^ 



nb d 



Hence the order of &q is computed by minimizing the function 



b 



(nftf)Va V fo ? 



«,dN 1/2' 



6i \b\ 



6 A V2 

a 



(bt) 



Since this function is increasing with bo, the minimum of Rn(-,h) is achieved for b* * = (u n / n) 1 ^ d+4 \ We 
shall prove later on that this choice of b^* is irrelevant compared to the one arising when nb d+i = 0(1). 

Consider now the case n&o +4 = Oil) i.e fr 4 . = O (l/(n6g)). This gives 



°0 



n 2 &2rf 



I/ 2 ' 

_2. 

1.7 



nig 



Moreover if n&gfe 4 — > oo, we have, since — > oo under (Ag 



/>'' 



1 



°o 



1 



b\ b\ 



n 2 bl d 



nb\ b\J \n 2 bl d ) b\ \n 2 b 2d )' \bj b\ J 
Hence the order of 6q is obtained by finding the minimum of the function b\ + (l/n, 2 &Q&f ) . The minimization 
of this function gives a solution bo such that 



b 



i 2 b\ 



, Rn(bo,bi) 



i 2 b\ 
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This value satisfies the constraints nb c ^ +A = 0(1) and nb^b\ — > oo when n 4 d bf +16 —> oo. 
If now nb% +i = 0(1) but nbfof = 0(1), we have, since nb^f — > oo, 



(M)f^), Af^)=Of^)f-^ 7 )=0' 



nb\ WbfJ \b\J \n%l d J ' 6? \n 3 6gV V & i/ W&oV W/V™ 3 ^ 
In this case, 6g is obtained by minimizing the function 6g + (l/n, 3 &Q d 6{), for which the solution bo verifies 

/ 1 \ 2d + 4 / 1 \ 2d + i 



This solution fulfills the constraint nbfibj = 0(1) when n 4 " d 6^ +16 = 0(1). Hence we can conclude that for 
&o = O (l/(n&o)), the bandwidth satisfies 

b o * max | (-1,) " 4 , (-^ ^ I , 

which leads to 

{ 4 4 \ 

( Jbj) + ' (jkl) + } • 

We need now to compare the solution bg to the candidate &q* — {u n /n) 1 ^ d+4: ^ obtained when nb d ) +A 00. 
For this, we must do a comparison between the orders of Rn(bo,bi) and i?„(&o*,&i). Since R n (bo,b\) > 6q 
for all 6 , we have R n (b* * M) > (u„/n) 4 /( d + 4 \ so that 



R n {b* M) < 



R n {bl*M) 



11 , 




n 2 b\j \n u O[J \u„ 

4(d + 8) 



(2d+4)(d + 4) 



= °( 1 ) + 0(-) +4 I — TS+iT I =o(l), 



using the fact n ( - d+8 '> b\ {d+A) 00 by (A 9 ) and that u„ -> 00. This shows that i?„(6^,fei) < i?„(&o*,&i) for 
all b\ and n large enough. Hence the theorem is proved, since 6g is the best candidate for the minimization 

ofi?„(-A). □ 

Proof of Theorem 14.31 

Recall that Theorem 14.21 gives 

AMSEibx) + RniKM) x n(fti) + r 2 (6i) + r 3 (&i) = F{h), 
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where 

ri (h) = h4 + ^ argminri(/i) x n- 1/5 = h* 1: mm n{h) x (/i*) 4 = n" 4/5 , 

r 2(/i) = ^ 4 H § 12 - ! argmin)'2(/i) x n" 3 ^ = /ij, minr2(/i) x (/ij) 4 ^n" 3 ^, 

n d+if l d+z 

r 3 (h) = h 4 H 12 1 28 , argminr 3 (/i) x n"23TTT = /j^ minr 3 (/i) x (/13) 4 = n~ aarrr. 

Each r 3 (/i) decreases on [0, argminrj (/i)] and increases on (axgmin r^/i), 00) and that rj(h) x /1 4 on 
(argminrj(ft,),oo). Moreover minr2(/i) = o(r3(/i)) and /i| = °{^ l V) f° r an possible dimension d, so that 
min{r 2 (/i) + r 3 (h)} x (/13) 4 = 7 i~2dTrr and argmin{r 2 (/i) + ^(/i)} x h% = n~^^ . 

Observe now that mm{r2(h) + T3(h)} = O (minri(/i)) is equivalent to n~ 2d + 11 = O (n -4 / 5 ) which holds 
if and only if d < 2. Hence assume that d < 2. Since n~ 2d + 11 = O (n -4 / 5 ) also gives argmin{r2(/i)+ r 3(' 1 )} x 
/13 = O (hi), we have 

minF(6i) x n~ 4/5 and argminF(6i) x n~ 1/5 . 
The case d > 2 is symmetric with 

minf (61) x nT 2d + 11 and argminF(&i) x n~ 2d+n _ 

This ends the proof of the theorem. □ 

Proof of Theorem 14.41 

Observe that the Tchebychev inequality gives 

n 

5^1(JT ( G #0) =nP(Ie # ) 



; = 1 







1 + Op 


(\/»). 







so that 



where 



Therefore 



U(e) 







1 + Op 








n 





fn(e), 



Me) = 



Si - e 



nb\¥ (X G Xq) •f—' v V fo i 

/ n (e) - E/ n (e) = (/„(e) - E/„(e)) + (/ n (e) - /„(e)) + O p f-±=J / n (e). (7.10) 
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Let now fi n (e) be as in the statement of Lemma |7.2[ and note that f n (e) = (l/ n ) Xa=i /™( e )- The second 
and the third claims of Lemma 17.21 yield, since nb\ diverges under (Ag), 



S? =1 E|/<n(e)-E/ <m ( e )| 3 < «MhJ\ K ^ dv + °{^ 
(E^iVar /m (e)) 3/2 ( m/(e) _J K>)dv + Jn\\ 



¥(XeX„)b 



3/2 



0(n&i)- 1/2 =o(l). 



Hence the Lyapounov Central Limit Theorem for triangular arrays (see e.g Billingslcy 1968, Theorem 7.3) 
gives, since nb\ diverges under (Ag), 

/„(e)-E/„(e) _ /»(e)-E/„(e) * 



v /Var/„(e) / Var/ <w ( e ) 

V n 

This yields, using the second result of Lemma |7.2[ 

/(e) 



TV (0,1). 



/ nbi(/„(e)-E/„(e)) 4JV 0, 



Kl{v)dv 



(7.11) 



Moreover, note that for n&g&f — > oo and nb^ — ► oo, we have 



2d 



n6f 



2 / -, Lrf\ / i \ 3 



1 , b d 



b\ b\ 



nb% 



= 



Therefore, since b\ = O (l/(n&g)), n6g6f — > oo and n6g d — > oo by (Aio) and (Ag), the equality above and 
(|7.4p ensure that 

1/2 

/«(e) - f n (e) 



Of 



&o + - 



1 , H 



b\ b\) \nb$J \bf b\J \nb d 



i , K 



Op ( &Q + - 



1/2 



Hence for &i going to 0, we have 

Vnh (fn(e) - f n (e) ) = O p 



nbi [b% + - 



1 



1/2 



= Op(l), 



n ' n 2 b^b\ n 3 bl d b 7 lt 

since n&Q&i = o(l) and that tt,6q6j — > 00 under (Aio). Combining the above result with (|7.11[) and (|7.10[) , we 
obtain 



<nb\ 



(7„(e)-E/ n (e)) 4iVf0, 



/(e) 



Kf{v)dv 



P{x ex 

This ends the proof the Theorem, since the first result of Lemma 17721 gives 

E/ n (e)=E/i n (e) = /(e) + ^-/< a >(e) f JlbWdv + otf) :=7„(e).D 
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Appendix A: Proof of the intermediate results 
Proof of Lemma 17.11 

First note that by (Ag), we have J zK§(z)dz = and jKo(z)dz = 1. Therefore, since Kq is continuous and 
has a compact support, (Ai), (A2) and the second-order Taylor expansion yield, for 60 small enough and 
any x in Xq, 



\9n(x)-g(x)\ = 
K (z) 



h j Ko 1 



b gW(x)z+ ^ Z gW(x + 6b z)z T 



Ko{z) [g(x + b z) - g{x)) dz 
, = 6(x,b z) e [0,1] 



6 ff (1) (V) / zK (z)dz + -| / zg {2) (x + db a z)z T K (z)dz 



"0 



zg ( - 2) (x + 9b z)z ' K (z)dz 



<Cbl 



Therefore 



sup \g n (x) - g(x)\ = O (bl) , 

xGX 



which gives the first result of the lemma. For the two last results of the lemma, it is sufficient to show that 



sup \g n (x) - g n (x)\ = O ff 



Inn 

nb$ 



1/2 



since g n {x) is asymptotically bounded away from over Xg and that \g n (x) — g(x)\ = 0(b 2 ) uniformly for x 
in Xq. This follows from Theorem 1 in Einmahl and Mason (2005). □ 



Proof of Lemma 17.21 

For the first equality of the lemma, note that by (A4), (A5) and (|7.9[) . we have 



E[/i„(e)] = -E 



e — e 



/(e) + y / v 2 K 1 (v)f^( 



Therefore the Lcbesguc Dominated Convergence Theorem gives, for b\ small enough, 

E[/ m (e)]-/(e)-|/ (2) (e) f v^K^dv 



b l 
2 

o(6?) 



i^K^v) / (2) (e + 6>M)-/ l2) (e) 



(2)/ 



dv 



This proves the first equality of the lemma. For the second and third results of the lemma, the proofs are 
straightforward. Hence they are omitted for the sake of brevity. □ 
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Proof of Lemma 17.31 

The order of S n follows from Lemmas 17.51 and 171)1 Indeed, since 

l(X i eX )(m in -m(X i )) = 1{X : d l Xo) {m{X j )+e 3 -m{X l ))Kj X ^ 

nb g-; 



Jin ~l~ j 



Lemmas 17.51 and 17.61 imply that 



S n — Of 



% (nb\ + (nb,) 1 / 2 ) + (nb\ 



^4 , h 



1/2 



which gives the desired result for S n . 

For the term T„, the order is obtained by computing the conditional mean and the conditional variance 
given X\, . . . , X n . To this end, define for any 1 < i < n, 

E in [-]=E n [X 1 ,...,X n ,e k ,k^i}. 

Therefore, since (rhi n — m(Xi)) depends only upon (Xi, . . . , X n , e^, k ^ i), we have 

E n [T n ] = E„ 



1 [X t € Xq) (m m - m{X. l )) 2 K[ 2} < £l 6 



= E„ 



n 

J2 1 {Xi e <*b) (m in - m(A^)) 2 E 



(2) / £ t -e 



with, using (A4) and Lemma [7TH - (|7.2p . 



3, 



(2) £i-e 



bi 



< Cb\. 



Hence the equality above, the Cauchy-Schwarz inequality and Lemma 17.71 yield 



|E„[T„]| < CblJ2 ] 



< Cnb\ ( sup E„ 1 (Xi € Xo) (m in - m(X t )) 4 

\l<i<n 

< o P K)^ + _L 

For the conditional variance of T n , Lemma 17.91 gives 

n n n 

Var„(T„) = VJ Var„ (£•„) + ^ ^ Cov„ (Cm, On) 



1/2 



i=l 



i=l 3 = 1 



= P (nfe) (tg + ^]+O r (nXbl /2 ) (bt + i 



nb" 



(A.l) 
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Therefore, since 61 goes to 0, the order above and (|A.1[) yield, applying the Tchcbychev inequality, 

1 \ , , ,1/7 / b\ 



T„ — Op 
= O w 



<» 4 H"° + ^J +( " 6,)1 T J+ ^ + 

(nbl + (nb I ) l,2 + (nW„bl)" 2 ) (b* + 



1 



which gives the result for T n . 

We now compute the order of R n . For this, define 



I in 



R in = 1 (Xi G Af ) (m m - to(X 4 )) 3 J ire , 
and note that i?„ = Xa=i -^m- The order of R n is computed in a similar way as for T n . Write 



E„ \R,- 



i=l 
n 

5^ 1 (Xi G Af ) (m in - m(X 4 )) 3 E m [4 



with, using (A4) and Lemma 17^41 (|T. 3[) 



Ej n 



(l-tf 



rff 



< C6?. 

Therefore the Holder inequality and Lemma 17. 71 yield 

n 

\E n [Rn}\ < CblY / E n\\^(X t eX )( 



i=l 



Tflin — m 



< Cb\ E' /4 [l (Xi G Xq) (m in - m(X i )f 



i=i 

< P (n6?)(^ + -L 



3/2 



For the conditional covariancc of i?„, Lemma 17.81 ensures that 

n n n y 

Var„ (R n ) = J2 Var„ (R ln ) + Y,J2( W X * - X oW ^ Cb o ) Cov ™ (^Rju) 



=1 j=i 



Considering the first term above, write 

Var„ (i? m ) < E„ [i£J < E ti 



11 (X, G Ab) (m in - m(X 4 )) 6 E in [/?,] 



(A.2) 



(A.3) 
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with, using (A4), the Cauchy-Schwarz inequality and Lemma 17.41 - (17. 3p . 



1 ^(3) f £i - t(m in - m(Xj)) - e ^ 



C 



( 3 ) f e - t(m in - m(Xi)) - e 



f{e)de 



dt 



< Cbi. 

Therefore 

Var„ (Rin) < ChE n [l (X, e X Q ) (fh in - m(X t )) 6 ] , 
uniformly in i. Hence Lemma 17.71 imply that 

n 

Var„ (Rin) < Cnbi sup E„ [l {X l € X Q ) (m m - m(X t )f] 



< O v {nb 1 )(bt + - p 
For the second term in (|A.3[) . we have 

|Cov n (R in , Rj n )\ < (Var„ (R m ) Var„ (R jn )) 1/2 



< Ch sup E„ [1 (X, e Xq) (m in - m(X i )f 

KKn 



Hence from Lemma 17.71 and the Tchebychcv inequality, we deduce 

n n / \ 

Y^rX^ Xi ~ x ^ - cb ° ) |Gov " Rj 



i=l 3=1 



f 1 \ 3 n ™ / 

< Op (bi) (b$ + — d j Y^^Xi-X^KClK 

l_1 &i 



1 



,2,d\ 



This order, (|A~4| and ()A~3l) give, since nb^j diverges under (Ag), 



Var(i?„)=0 P ^ + _L^\ 



n 2 b d Q bi) 



Finally, with the help of this result and (|A.2[) we arrive at 

3/2 



= Op 
= Op 



+ (" 2 *>" 2 (>J+4) 



3/2' 



nbt) 



3/2' 



.□ 



(A.4) 
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Proof of Lemma 17.41 



Set h p (e) = e p /(e), p G [0, 2]. For the first inequality of (|7.1[) . note that under (A 5 ) and (A7), the change of 
variable e = e + b%v give, for any integer £ <G [1, 3], 

2 



K 



(0 



e p f(e)de 



61 / K { p{vfh p {e + b 1 v)dv 



< b lS up\h p (t)\ / |i^%) 2 |^ 

< Cbi, 



(A.5) 



which yields the first inequality in (|7.ip . For the second inequality in (|7.1|) . observe that /(•) has a bounded 
continuous derivative under (A5), and that J K^p (v)dv = by (A7). Therefore, since h p (-) has bounded 
second order derivatives under (Aq), the Taylor inequality yields 



( ^ ) 



= 61 



Kf\v) [h p (e + hv) - h p (e)} 



dv 



< 



6?sup|fcW(t)| / |ui<f>)|<ft;<C& 2 



which proves (|7.1[) . The first inequalities of (|7.2j) and (|7.3[) are given by (|A.5j) . The second bounds in (|7.2 
and (|7.3j) are proved simultaneously. For this, note that for any integer £ € [2, 3], 



Under (A7), K\{-) is symmetric, has a compact support and two continuous derivatives, with J K[ £ \v)dv = 
and JvK[ e \v)dv = 0. Hence the second order Taylor expansion applied to h p (-) gives, for some 8 = 
0(e,M € [0,1], 



( I h p {e)de 



bi 



h I K[*\v) [h p {e + hv) - hp(e)} dv 
dv 



Mf(e) + ^ 2) (e + M 



I / v 2 K[ e \v)h p 2 \e + 9b 1 v)dv 



< 



f sup |/f> (t)\ I v 2 K[ e \v) 



dv < Cbf, 



which completes the proof of the lemma. 
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Proof of Lemma 17.51 



By (A4) and Lemma \7. 4\ (|7.1[) we have 



Var„ 



i=i 



(i) 



£j- e 
h 



Si - e 



E 



< 



1=1 



&1 



< Cnb 1 max |/3 m | , 

Ki<n 



< Cnb\ max 1/3™ I 

Ki<n 



Hence the Tchebychev inequality gives 



(1) 



i=l 



s 8 - e 
h 



P (nbl + (n&i) 1/2 ) max |ft n | 



so that the lemma follows if wc can prove that 



sup I An I =Op(6g), (A.6) 

l<i<n 

as established now. For this, define 

(X — \ 1 n 

and F„(a:) = E[£j(x)]/£>o ; so that 

o _ n - 1 ^n(Xj) +77»(X») 

Pin — ^ 

For maxi<j<„ first observe that a second-order Taylor expansion applied successively to g(-) and 

m(-) give, for 60 small enough, and for any x, z in X, 



[m(x + boz) — m(x)] g(x + boz) 



b m ( - 1 \x)z + -^zmW(x + db z)z T 



g(x) + b o9 W(x)z + -jzg^(x + Q 2 b z)z 



for some £1 = C,i(x, boz) and £2 = (2(2, boz) in [0, 1]. Therefore, since J zK(z)dz = under (Ag), it follows 
that, by (Ai), (A 2 ) and (A3), 

(m(x + boz) — m{x)) Ko(z)g(x + boz)dz 



max |f„(Xj)| < sup |i/ n (x)| = sup 

l<i<« xEXo xeXa 

< Cb 2 . 



(A.7) 



Consider now the term maxi<j<n |fj ra (Xj)|. Using the Bernstein inequality (see e.g. Serfling (2002)), we 
have for any t > 0, 

Pfmax \p in {Xi)\ >t\ < ^P(W in (Xi)\>t)<Y^ F(\u in (x)\>t\Xi=x)g(x)dx 

^ ' 1 = 1 4=1 



< 2ncxp — 



(n- l)t 2 



2sup xe ^Var(g(x)/& d ) + |||i 



2G 



where M is such that sup^g^ |Cj(x)| < M. Hence (A2), (A3), (Aq) and the standard Taylor expansion yield, 
for bo small enough, 

1 f Cb 2 

SUp ^ ' > 1 f T* —1_ -y^ — 1 ) j (nr*\ \ ( "/ \ fi { T* —I— -y \rj •/ <^ 

so that, for any t > 0, 

P ( max |i/ m pf 4 )| > t ) < 2nexp 



sup |Cj(^)| < C^o, sup 

Var(0(a)/6o) < 13 SU P / + M - ™(z)) 2 «o(«)5(ic + b z)dz < " 

(n-l)b d tybl 



\l<i<n 



C + Ct/bo 



This gives 



max |^„(Xi)| > . 



i < 2nexp 



t 2 Inn 



V 



1/2 



0(1), 



provided that t is large enough and under (A§). It then follows that 



max \u ln {X l )\ = Op . 

KKn \ nbn 



1/2 



This bound, (|A.7[) and Lemma \7 . 1 1 show that (|A.6p is proved, since 6q In ji/(ji6q) = O (6q) under (Ag), and 
that 

o _ n-1 Vin{Xj) +V n (Xj) 

Pin — ^ • '— ' 



9i 



Proof of Lemma 17.61 

Note that (A4) gives that Ej n is independent of £j, and that E n [S,- n ] = 0. This yields 



(1) 



Sj-e 



(A.8) 



Moreover, write 



Var, 



.i=l 
n 

^Var 



Einivj 



h 

L) f e a -e 
61 



i=l 3 = ! 



(A.9) 
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For the sum of variances in above, Lemma ITU - (|7.1[) and (A4) give 



^2 Var « 



z -'%n ri - 1 



(1) / Si 



e, - e 



< ]Te„[E 2 „]E 



(1) / Si 



e, - e 



< 



< 



Cha 2 



EE 



i(X g Xo) ^ 2 fXj - x. 



nb\ 



-Ko 



b 



— V 



i=i 



.9/ 



where a 2 = E[e 2 ] and 



1 " 
raft? ^ 1 



2 / -*AJ — Xi 



For the sum of conditional covariances in (|A.9|) . note that 



E E Cov - 

i=l 3=1 



^tn-"- 1 



(1) / ti 



Si - e 



bi 



= EE- 



i=l 3=1 

3Vi 



EE 

1=1 3 = 1 

3^i 



.(i) f Ej-e ^ f 



&1 
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I(X i; 6 XQ)l{Xj € A-p) 
(nb%) 2 g ln g jn 



EE^ 



fc=i £=1 



Xfc — X; 



Xn 



b 



(A.10) 



where 



= e k K{ 



(l) / £i 



Further, under (A4), it is seen that for k 7^ £, E[^i£,ej] = when Card{i, j, k,£} > 3. Hence the symmetry 
of Kq(-) assumed in (A7) imply that 



E E Cov " 



i=l 3 = 1 



(1) 



£» - e 
61 



|Jj)BAl 1 &! 



EE 



1(X 4 G *o)lpO G *„) 2 f X, - X 



i=l 3=1 

3V> 



QinQjn 

^^ t(x t €X )i(x 3 ex ) 

(nb^) 2 g ln g J7 ^ ° 



E 



(i) 



e — e 



1 = 1 3 = 1 

3Vi 



fe = l 



Xfe — Xi \ / Xfe — X, 
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(i) 



£ — e 
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Therefore, since 

sup pc— : = Op(l), 

l<i<n \ \9in\ J 
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by Lemma 17.11 then Lemma 17.41 (|7.1[) gives 



1=1 3 = 1 



(1) / Ei 



£i - e 



(i) / e o 



Ei — e 



bi 



tA s n n ^ 

Op ( ) £ e ^ o) ^" + 0p ( 6 i) £ e 



(A.H) 



where o;„ is defined as in (|AlQ]l and 



9i 



^ n n 



At — X; 



Ao 



Xk — Xj 



In a completely similar way as done for Lcmma l7.1[ it can be shown that <?i„ = Op(l) uniformly in i and for 
n large enough. Therefore 



J2t(Xi G A- )g m - Op(n). 

i=l 

For the second term in (|A.11|) . the changes of variables x\ — x% + b§z\ and xi — xj, + b^z^ give 



(A.12) 
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X I :A <- • V ''. 
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< E 

Cn 3 



< 



A„ 
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to 

£3 - xi 



A 
A 



A3 — Aj 



X3 - x 2 



bo 



J g{xk)dx k 



k=l 



Cn 3 h 



2d 



Cn, 



so that 

Hence from (|A.9[) - (IA.12I) . we deduce 



n 

Y,nx i ex )% n = o r {n). 



Var„ 



^ ' ^iri Aj 



,4 



6l , 6f 



OvljI + TJ+nbl =0 P [- l+ nb 



°0 °0 



Finally, this order, (|A.8|I and the Tchebychev inequality ensure that 



^ ' Si n aJ 



(1) / Ei 



i=l 



1/2 



d , nb\ ) .□ 
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Proof of Lemma 17.71 

Define 

1 " 
■ 9m = Tihd E K ° 



i=ij¥» 



Xj — Xi 



The proof of the lemma is based on the following bound: 



1 (X, G A"o) (m m - m(Jfi))* 



< C* 



(n^)( fe / 2 )^ 



ft G {4,6}. 



(A.13) 



Indeed, taking successively k = 4 and fc = 6 in (|A.13[) , we have, by (|A.6[) , Lemma [7TT1 and (Ag), 



sup E„ 

l<i<n 

sup E n 

Ki<n 



1 (Xi G Xo) (rhi„ - m(X i )f 



1 {Xi G Xo) (fh m - m(X i )f 



1 



which gives the desired results of the lemma. Hence it remains to prove (|A.13|) . For this, define j3 in and E in 
respectively as in the statement of Lemmas 17.51 and 17.61 Since 11 (Xi G Xq) (mj„ — m(Xi)) = + S, n , and 
that /3j„ depends only upon (Xi, . . . , X„), this gives, for k G {4, 6} 



<c/4 + cn 



(A.14) 



The order of the second term of bound (|A.14|) is computed by applying Theorem 2 in Whittle (1960) or the 
Marcinkiewicz-Zygmund inequality (see e.g Chow and Teicher, 2003, p. 386). These inequalities show that 
for linear form L = X)j=i with independent mean-zero random variables £i, . . . , f n , it holds that, for 
any k > 1, 



E|L fc | < C(k) 



E^ 2/fe lc 



fe/2 



where C(k) is a positive real depending only on k. Now, observe that for any i G [1, n], 



^ 1 {_Xj G A-p) 

nu oym 



Xj - Xi 



Since under (A4), the <Jji n 's, j G [l,n], are centered independent variables given X\, . . . ,X n , this yields, for 
any k G {4, 6}, 

fe/2 



E„ [Si] < CE 



1 (Xi G Aib) 



(n^) 2 $ 2 



^2 / 



Xj - Xi 



< 



Ct (X t eX )g, 
(nbfrWVgi 



fe/2 
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Hence this bound and (|A.14[) give 



l(Xi g Xq) (fh m ~m{X l )y 



< c 



III 



{nb d Y k / 2 )gl 



fe/2 



which proves (|A.13[) and then completes the proof of the lemma. □ 
Proof of Lemma 17.81 

Since -Ko(-) has a compact support under (Ae), there is a C > such that \\Xi — Xj\\ > Cbg implies that for 
any integer number k of [l,n], Ko{[Xk — Xj)/bo) — if K^((Xj — Xk)/bo) ^ 0. Let Dj C [1, n] be such that 
an integer number k of [1, n] is in Dj if and only if Ko((Xj — Xk)/bo) ^ 0. Abbreviate P(-|Xi, . . . , X n ) into 
P„ and assume that \\Xi — Xj\\ > Cbo so that Di and Dj have an empty intersection. Note also that taking 
C large enough ensures that i is not in Dj and j is not in Di. It then follows, under (A4) and since Di and 
Dj only depend upon X\ , . . . , X n , 



Si e A 



(m m - g ^4 and (fh jn - m(Xj),Ej) g £ 

_ ((EkeD^i} im{X k ) - miX,) + e k ) K ((X k - XJ/bo) 



and 



,e 3 - £5 



Si \ eA 



\\ Efeer>i\{i} A 'o ((-Xfc - Xi)/bo) 

xp / f J2eeDj\{j} ( m (X?) - mjXj) + e<) ifr ((X* - X,-)/6q) 
"VV EeeD A{j} Ko((X t -X/)/b ) 
Pn ((m in - m(Xi),£i) ei)x P„ ((m jn - m(Xj),Sj) £ B) . 



g B 



This gives the result of Lemma 17.81 since both (mj„ — m(Xi), Ei) and (mj n — m(Xj),Ej) are independent 
given Zi, ... ,X n . □ 



Proof of Lemma 17.91 

Since fhi n — m{Xi) depends only upon (X\, . . . , X n ,Ek, k ^ i), we have 



J2 Var « (c^) < E E « [&] = E E - 



4=1 



1=1 



i=l 



l(X i eX )(m in -m(X i )yE i . 



K 



(2) 
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with, using Lemma \7.4\ (|7.2 j) . 



3, 



(2) / gi-e 
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/(e)cte < Cfoi 



Therefore these bounds and Lemma 17.71 give 

n n 



< Cn6i sup ] 

Ki<n 



< Op (nfti) K + 



1 (X, G Ab) (m in - m(X)) 4 

1 (X, g Xo) (m in - m(Xi)Y 
1 ^ 2 



which yields the desired result for the conditional variance. 

We now prepare to compute the order of the conditional covariance. Observe that Lemma 17.81 gives 

n n n n y \ * \ 

J2 C0V " (Cin, &«) = Y, E 1 ( W X * - ^J-ll < Cb ° ) ( E » " E n [Q n ] E n [&„] J . 

i=l i=i i=l 3=1 V ' \ ' 

The order of the term above is derived from the following equalities: 

J2^l(\\Xi- X, II < Cb Q ^j E n [d n ] E n [Q n ] = O p (n 2 b d bf) U + 

n n s \ / , 

Y^WXi-XiW < Cb Q \En[CvnCjn] = P (n 2 ^ 2 ) ( b* + — d 

Indeed, since b\ goes to under (A9), (|A.15I) and (|A.16[) yield 



i=l 3=1 



nb° n 



(A.15) 
(A.16) 



which gives the result for the conditional covariance. Hence, it remains to prove (|A.15[) and (|A.16[) . For 
(|A.15|) , note that by (A 4 ) and Lemma WM (| 7.2 |l , we have 



|En[Ctn]| = 



3,, 



< C6f E 



1 (X G # ) {m in - m(X)) 5 



1 (X G Xo) (m in - m(X)) 4 



K 



(2) f gj-e 

61 

1/2 
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Hence from this bound and Lemma 17.71 we deduce 



sup |E„[Ci„]E„[On]| < Cb\ sup E„ 

l<i j"<n 1 < i < n 



< 



Of (bf) bt + 



t {Xi e Xo) (m m - m(Xi)Y 
nti 



Therefore, since 

n n / \ 

^^l(||X i -X J ||<C6 )=Op«, 

i = l 3=1 ^ ' 

by the Tchebychev inequality gives, it then follows that 

E 1 (\\ X * - X iH < C& o W [Cin] E„ [<,-„] = Op (n 2 &^?) (b 4 + -L 



(A.17) 



1=1 3=1 



which proves (TA~1~5)) . For (TA"T6l) . set Z m = 1 (A, G Af ) ( 



'0 

m(Aj)) 2 , and note that for i ^ j, we have 
(2) fsi — e 



E 



(A.18) 



where 



E; 



(2) 



7 R''' 



£i - e 



A 



(2) f gj - e 
6i 



2ft„Ei 



(2) ( gj - e 
6i 



S 2 ^(2) f Ei-e 



The first term of (|A.19[) is treated by using Lemma ITU - (|7.2j) . This gives 







i2 T 



K 



(2) ( S j-e 



< Cblp 



(A.19) 



(A.20) 



Since under (A4), the e^-'s arc independent centered variables, and arc independent of the Xj's, the second 
term of (|A.19|) equals 



1 (Xj e Xo) ^ (x k -x 3 



\ b 



E; 



(2) ( e l __e 
bi 



2/3, 



nb o9jn V b 



E, 



(2) / £»-e 



61 



Therefore, since Ag is bounded under (Ag), the equality above and Lemma [7^1 - (| 7. 2 p imply that 



2/3j n Em 



T,j n K 1 



■C2> 



'o9jr 



(A.21) 
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For the last term of (|A.19[) . we have 



E; 
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n n 



Xk — Xj \ T ^ ( Xi — Xj 



SkSiKi 



(2) 



E K % 



2 / A k 



with, using Lemma 17^41 - ()7.2|) . 



Xi, — 



r 2 K {2) I gi - g 
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< max ^ sup 

< 



, E[e 2 ]sup 



3, 



A' 



(2) 



Therefore 



E, 



S 2 K (2) / gj-e 



< 



d « - "\2 



fe=l,fe#j 

Substituting this bound, (fOIjl and (|A~20f in ()A~T9l) . we obtain 



3, 



where 



M n = sup 

l<j<n 



Pjn + \Pjn\ 



1 \ ~> K 2 ( Xk - Xj 



Hence from (|A.18[) , the Cauchy-Schwarz inequality, Lemma \7. 71 and Lemma [7T4T - (|7.2|I , we deduce 



^^lhXi-XjW <Cb ) |E„[C m C,„]| 

i=l 3=1 ^ ' 

n n y \ 

< CM n b\ ^X) 1 ( P* - X jH < C&o H 



t=i j=i 



35« 



< 1 ( j | Xi - X 3 1 1 < Cb ) El/ 2 [Zf n ] EV 2 



(2) / E J 



e» — e 



< Af„6?0 P (&g + -L) (M 1 / 2 E E fl (11^ - -XVII < Cb 



i=l 3 = 1 
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Further, using (IA.6|) and Lemma 1 7. 11 it can be shown that 

Therefore, substituting this order in the inequality above, and using (|A. 17|) . we arrive at 

n n / s / -. \ 2 

1 ( W X * - X oW < Cb K [OnCfn] = Op (n 2 6g^ /2 ) 6* + — ^ , 
i=l 3=1 ^ / \ / 

which proves (|A.16[) and completes the proof of the lemma. □ 
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Figure 1: Curves of the densities /„ (dashed line) and /„ (solid line) in univariate case for Co = 1 and for 
sample sizes n = 50 (left side) and n = 100 (right side). All the values of /„ and /„ arc calculated from 300 
replicates of generated data. 




Figure 2: Curves of the densities /„ (dashed line) and /„ (solid line) in trivariate case for cq = 1 and for 
sample sizes n = 100 (left side) and n = 200 (right side). The values are computed from 300 simulation runs. 
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-1 




0.25 


0.2380 


0.0080 


0.0647 


0.1592 


0.0062 


0.0316 




50 


0.5 


0.2380 


0.0080 


0.0647 


0.2185 


0.0069 


0.0547 
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0.2380 


0.0080 


0.0647 


0.2357 


0.0071 


0.0627 






0.25 


-0.0019 


0.0034 


0.0034 


-0.0038 


0.0027 


0.0027 




100 


0.5 


-0.0019 


0.0034 


0.0034 


-0.0026 


0.0034 


0.0034 
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0.0030 


0.0030 







0.25 


0.3843 


0.0106 


0.1583 
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0.0038 


0.0038 


-0.0042 


0.0033 


0.0033 




100 


0.5 


-0.0007 


0.0038 


0.0038 


-0.0058 


0.0034 


0.0035 






1 


-0.0007 


0.0038 


0.0038 


-0.0063 


0.0033 


0.0034 



Table 1: The table compares some values of the bias, variance and mean square error of the estimators /, 
and /„ when the data are generated from Model [Q1 All these values are based on 300 simulations runs. 
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-0.0013 


0.0035 


0.0036 


-0.1250 


0.0015 


0.0539 




100 


0.5 


-0.0013 


0.0035 


0.0036 


-0.0180 


0.0027 


0.0416 
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0.0035 


0.0036 


0.0064 


0.0027 


0.0483 
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0.0020 


0.0019 


0.0019 
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0.0011 


0.0531 
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0.0020 


0.0019 


0.0019 
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0.0014 
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0.0329 
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0.0047 


0.0047 
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0.1451 
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-0.0049 


0.0047 
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-0.0377 


0.0044 


0.3318 






0.25 


-0.0024 


0.0030 


0.0030 


-0.1713 


0.0028 


0.0378 
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0.5 


-0.0024 


0.0030 


0.0030 


-0.0764 


0.0030 


0.1817 






1 


-0.0024 


0.0030 


0.0030 


-0.0297 


0.0026 


0.3591 


1 




0.25 


-0.0020 


0.0031 


0.0031 


0.0341 


0.0031 


0.0419 




100 


0.5 


-0.0020 


0.0031 


0.0031 


-0.0131 


0.0025 


0.0033 






1 


-0.0020 


0.0031 


0.0031 


-0.0010 


0.0028 


0.0416 






0.25 


-0.0064 


0.0019 


0.0019 


0.0239 


0.0020 


0.0325 




200 


0.5 


-0.0006 


0.0019 


0.0019 


-0.0101 


0.0016 


0.0062 






1 


-0.0006 


0.0019 


0.0019 


-0.0126 


0.0016 


0.0477 



Table 2: The table gives some values of the bias, variance and mean square error of f n and /„ when data 
are generated from Model 15.21 All values are based on 300 replications of simulated data. 
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